Biological Pattern Discovery with R Machine Learning Approaches (Zheng Rong Yang)

els are shown in Table 3.12.

actor Xa protease cleavage data classification

rtant feature of the decision tree algorithms is that they can handle

erical data. For instance, both CART and C5.0 can be applied to

r Xa protease cleavage data set directly without an encoding

Yang, et al., 2006], which was composed of cleaved and non-

peptides. Each peptide is a string of the amino acids, which are

erical. The structure of the factor Xa protease sub-sequences was

ܴଵܴଵ

ᇱ, where cleavage happens between ܴଵ and ܴଵ

ᇱ. These five

were labelled as P1, P2, P3, P4 and P5 in data. Figure 3.45(a)

e CART tree model for this data set and Figure 3.45(b) shows the

model for this data set.

(a) (b)

a) The CART tree model and (b) the C5.0 tree model constructed for the factor

e cleavage data.

e 3.46(a) shows the ROC curves as well as AUC for the CART

models constructed for the factor Xa protease cleavage data,

UC values were 0.856 and 0.916 for the C50 and CART models,

ely. Figure 3.46(b) shows a sequence logo generated by the

ogo package [Wagih, 2017]. Comparing the upper panel and the

nel of Figure 3.46(b), it can be seen why P1 (ܴସ) and P5 (ܴଵ

ᇱ) were

as the most discriminative variables. This is because the amino

position trends at these two residues demonstrated the greatest